183 research outputs found
Contour regression: A general approach to dimension reduction
We propose a novel approach to sufficient dimension reduction in regression,
based on estimating contour directions of small variation in the response.
These directions span the orthogonal complement of the minimal space relevant
for the regression and can be extracted according to two measures of variation
in the response, leading to simple and general contour regression (SCR and GCR)
methodology. In comparison with existing sufficient dimension reduction
techniques, this contour-based methodology guarantees exhaustive estimation of
the central subspace under ellipticity of the predictor distribution and mild
additional assumptions, while maintaining \sqrtn-consistency and computational
ease. Moreover, it proves robust to departures from ellipticity. We establish
population properties for both SCR and GCR, and asymptotic properties for SCR.
Simulations to compare performance with that of standard techniques such as
ordinary least squares, sliced inverse regression, principal Hessian directions
and sliced average variance estimation confirm the advantages anticipated by
the theoretical analyses. We demonstrate the use of contour-based methods on a
data set concerning soil evaporation.Comment: Published at http://dx.doi.org/10.1214/009053605000000192 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Modeling a Decentralized Asset Market: An Introduction to the Financial "Toy-Room"
In this paper, we describe a micro-founded simulation enviroment for decentralized trade in financial asset. Within the philosophy of computer simulated "artificial markets", this enviroments allows one to experiment in a modular fashion with (i) individual characterizations in terms of behaviors and learning, (ii) different architectural and institutional traits of the market, and (iii) time-embedding of events at the system and the individual level.-
Probabilistic -mean with local alignment for clustering and motif discovery in functional data
We develop a new method to locally cluster curves and discover functional
motifs, i.e.~typical ``shapes'' that may recur several times along and across
the curves capturing important local characteristics. In order to identify
these shared curve portions, our method leverages ideas from functional data
analysis (joint clustering and alignment of curves), bioinformatics (local
alignment through the extension of high similarity seeds) and fuzzy clustering
(curves belonging to more than one cluster, if they contain more than one
typical ``shape''). It can employ various dissimilarity measures and
incorporate derivatives in the discovery process, thus exploiting complex
facets of shapes. We demonstrate the performance of our method with an
extensive simulation study, and show how it generalizes other clustering
methods for functional data. Finally, we provide real data applications to
Berkeley growth data, Italian Covid-19 death curves and ``Omics'' data related
to mutagenesis.Comment: 22 pages, 6 figures. This work has been presented at various
conference
Composite likelihood inference in a discrete latent variable model for two-way "clustering-by-segmentation" problems
We consider a discrete latent variable model for two-way data arrays, which
allows one to simultaneously produce clusters along one of the data dimensions
(e.g. exchangeable observational units or features) and contiguous groups, or
segments, along the other (e.g. consecutively ordered times or locations). The
model relies on a hidden Markov structure but, given its complexity, cannot be
estimated by full maximum likelihood. We therefore introduce composite
likelihood methodology based on considering different subsets of the data. The
proposed approach is illustrated by simulation, and with an application to
genomic data
On the impact of serial dependence on penalized regression methods
This paper characterizes the impact of covariates serial dependence on the
non-asymptotic estimation error bound of penalized regressions (PRs). Focusing
on the direct relationship between the degree of cross-correlation of
covariates and the estimation error bound of PRs, we show that orthogonal or
weakly cross-correlated stationary AR processes can exhibit high spurious
correlations caused by serial dependence. In this respect, we study
analytically the density of sample cross-correlations in the case of two
orthogonal Gaussian AR(1) processes. Our results are validated by an extensive
simulation study. Furthermore, we introduce a new procedure to remedy spurious
correlations in a time series regime, applying PRs to pre-whitened (ARMA
filter) time series. We show that under mild assumptions our procedure allows
both to reduce the estimation error and to develop an effective forecasting
strategy. The estimation accuracy of our proposal is validated by means of
simulations and an empirical application based on a large monthly macroeconomic
data relative to the Euro Area economy
A general theory for nonlinear sufficient dimension reduction: Formulation and estimation
In this paper we introduce a general theory for nonlinear sufficient
dimension reduction, and explore its ramifications and scope. This theory
subsumes recent work employing reproducing kernel Hilbert spaces, and reveals
many parallels between linear and nonlinear sufficient dimension reduction.
Using these parallels we analyze the properties of existing methods and develop
new ones. We begin by characterizing dimension reduction at the general level
of -fields and proceed to that of classes of functions, leading to the
notions of sufficient, complete and central dimension reduction classes. We
show that, when it exists, the complete and sufficient class coincides with the
central class, and can be unbiasedly and exhaustively estimated by a
generalized sliced inverse regression estimator (GSIR). When completeness does
not hold, this estimator captures only part of the central class. However, in
these cases we show that a generalized sliced average variance estimator
(GSAVE) can capture a larger portion of the class. Both estimators require no
numerical optimization because they can be computed by spectral decomposition
of linear operators. Finally, we compare our estimators with existing methods
by simulation and on actual data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1071 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …